An interactive motion perception tool for kindergarteners (and vision scientists)

We implement Adelson and Bergen's spatiotemporal energy model with extension to three-dimensional (x–y–t) in an interactive tool. It helps gain an easy understanding of early (first-order) visual motion perception. We demonstrate its usefulness in explaining an assortment of phenomena, including some that are typically not associated with the spatiotemporal energy model.

In fact, by simply playing around with the tool, we discovered that spatiotemporal energy models (Adelson & Bergen, 1985) directly explain many more phenomena than previously understood 4 . We applied our tool to illusions available on YouTube and Twitter, as well as curated lists (Bach, 1997;Shapiro & Todorovic, 2017). For some of them, we also moved the camera to mimic head/eye movements. In Figure 2, we show outputs on an assorted list of phenomena we have found in this process. This is in contrast to traditional methods of testing a model on visual stimuli. Instead of saving image sequences to disk, processing them offline, and generating visualizations of energy for different motion directions over time (as a post-processing step), our tool allows doing all this live.
Take, for example, the stepping feet illusion (Anstis, 2001). In this illusion (Bach, 2004: shows a demo), the yellow and blue "feet" are set against a grating of black-and-white lines. The "feet" are vertically aligned, and move smoothly and at the same speed. But they produce an illusion of stepping alternatively as if the yellow foot pauses when the blue foot moves and vice versa. When we point a webcam at this illusion and run our motion perception tool, it directly and immediately reflects our perception (row 1 of Figure 2). The "feet" appear to move in distinct steps, at consecutive times 5 t and t + 1. You may also reduce the contrast of grating (Bach, 2004: un-tick the "Hi contrast" button), and notice that it weakens the strength of the illusion. With this, it is easy to understand the role of contrast in motion perception (Anstis, 2004).
Another example is the Pinna-Brelstaff illusion (Pinna & Brelstaff, 2000). First, let us look at the animated online version of this illusion (Bach, 2003) that simulates the head moving toward/ away (looming) with respect to the static pattern. Here, even though the rings are simply expanding or contracting, it elicits additional illusory rotations for each ring. We can verify this same percept with our tool (row 4 of Figure 2).
To understand the visualization, the notion of a "phase" for rings is helpful. On an expanding ring without rotation, every point will move away from its center in the same direction of the radial line 6 . On the other hand, a rotating ring produces motion along the tangent at every point. Thus, the visual motion "phase" of a rotating ring relates to an expanding ring in the following way: taking the  (Pinna & Brelstaff, 2000) is immediately revealed through the colors encoding motion energy (Adelson & Bergen, 1985). Refer to Figure 2 for more details.
expanding ring as a reference (0 • ), a clockwise rotating ring has a phase of +90 • . In other words, a clockwise rotating ring is +90 • rotation of an expanding ring. Likewise, a counter-clockwise rotating ring has a phase of −90 • . Any intermediate phase indicates simultaneous rotation and expansion as the resultant phase is a vector addition of movement due to expansion and rotation.
For the animated version of the Pinna-Brelstaff illusion (Bach, 2003), the output of our motion perception tool is a combination of radial and tangential motion. When the rings are expanding, the inner ring has a phase of +45 • (illusory clockwise rotation), and the outer ring has a phase of −45 • (illusory counter-clockwise rotation). You may further change the angle of Gabor elements, using the slider on the right, and immediately notice its effect on the strength and direction of illusory rotations.  (Anstis, 2001;Bach, 2004;Kitaoka & Anstis, 2021), Kinegram (Bach, 2014), structure from motion (Bach, 2002;Rogers & Graham, 1979), Pinna-Brelstaff (Bach, 2003;Pinna & Brelstaff, 2000), Translational Moirè Patterns (Bach, 2013;Spillmann, 1993), Spine drift (Bach, 2011;Kitaoka, 2010), grid masking (Bach, 2019), and global motion influenced by arrows (@jagarikin, 2022). (Pinna & Brelstaff, 2000) also elicit an illusory percept for translation and rotations of the head. This too can be reproduced with our tool by simply moving the camera, roughly recreating the required head motion. The ability to interact with the visual illusion by moving the camera is powerful as it allows us to understand action-perception coupling (Rolfs & Schweitzer, 2022) for the case of self-movement and visual motion perception.

Static patterns of the Pinna-Brelstaff illusion
The spine drift illusion (Bach, 2011;Kitaoka, 2010: shows a demo) demonstrates such a relationship between eye movements and motion perception (Menshikova & Krivykh, 2016). When we view this illusion, the central square appears to float with respect to the background. We can reproduce this by slightly moving the camera, which simulates fixational eye movements (Rolfs, 2009). This produces perceived motion along two distinct directions (row 6 of Figure 2), indicating that the center square excites motion receptors strongly in a direction different from that of the periphery.
The key to the usefulness of our tool is its ability to run in real-time. For this, we use PyTorch (Paszke et al., 2019), a library targeted toward deep learning. It makes low-level accelerated computing routines 7 accessible through a high-level programming language. With a few lines of code, it is easy to apply linear filtering (convolutions) on a sequence of images-a spatiotemporal volumein real-time. For example, our tool is a template for a component of an active vision robotic application (Battaje & Brock, 2022) that uses fixation and resultant motion cues for 3D perception.
Similarly, we believe the interactive real-time nature of this tool could be extended to other domains. From color perception to the perception of causality and animacy, when there are computational models that may be expressed as linear filters (for which computation is fast), it would be easy to implement tools similar to the one described here, and immediately "see" the results of a given model.
In conclusion, we present an interactive tool that helps explain early visual motion perception. The setup is simple: a laptop and an external webcam. Using this tool, we can easily explain old, as well as new, visual phenomena. This also works for phenomena that involve physical eye movements. The code is openly available and uses accelerated computing libraries that make it easy to adapt to other, more complex visual perception models. With this, the process of learning and discovery becomes as simple as playing with toys. We hope the vision science community can take advantage of such a method of interactive discovery.